Goto

Collaborating Authors

 gpu accelerator


The Odious Comparisons Of GPU Inference Performance And Value

#artificialintelligence

While AI training dims the lights at hyperscalers and cloud builders and costs billions of dollars a year, in the long run, there will be a whole lot more aggregate processing done on AI inference than on AI training. It might be a factor of 2X to 3X compute capacity higher soon, and anywhere from 10X to 100X higher capacity within a decade. What we all do suspect, however, is that there will be relatively few heavy duty AI training devices and platforms that use them and myriad and numerous AI inference devices. And so the relative performance and price/performance of compute engines that run inference are going to be important as they are deployed at scale. Meta Platforms helped invent many of the machine learning techniques and technologies that are being deployed in production these days, and it is was no surprise to us that the company had created a unified inference framework, called AITemplate, which it open sourced and described earlier this month in an MetaAI engineering blog post.


SambaNova Doubles Up Chips To Chase AI Foundation Models

#artificialintelligence

One of the first tenets of machine learning, which is a very precise kind of data analytics and statistical analysis, is that more data beats a better algorithm every time. A consensus is emerging in the AI community that a large foundation model with hundreds of billions to trillions of parameters is going to beat a highly tuned model on a small subset of relevant data every time. If this turns out to be true, it will have significant implications for AI system architecture as well as who will likely be able to afford having such ginormous foundation models in production. Our paraphrasing of "more data beats a better algorithm" is a riff on a quote from Peter Norvig, an education fellow at Stanford University and a researcher and engineering director at Google for more than two decades, who co-authored the seminal paper The Unreasonable Effectiveness of Data back in 2009, long before machine learning went mainstream but when big data was amassing and changing the nature of data analytics and giving great power to the hyperscalers who gathered it as part of the services they offered customers. "But invariably, simple models and a lot of data trump more elaborate models based on less data," Norvig wrote, and since that time, he has been quoted saying something else: "More data beats clever algorithms, but better data meets more data."


Nvidia Will Be A Prime Contractor For Big AI Supercomputers

#artificialintelligence

Normally, when we look at a system, we think from the compute engines at a very fine detail and then work our way out across the intricacies of the nodes and then the interconnect and software stack that scales it across the nodes into a distributed computing platform. But this time, when we are going over the many announcements that Nvidia is making at its GPU Technical Conference 2022 online event, we want to start at the middle layer where the nodes meet the network and work our way up because this is what makes Nvidia a real contender as a high performance computing system maker – meaning machines designed to run AI, HPC, data analytics workloads and not just traditional HPC simulation and modeling. In fact, we think the innovations unleashed at GTC 2022 this year are going to make Nvidia one of the key prime contractors for such systems operating at exascale and beyond. To play that game, you have to have architecture and deep pockets, and Nvidia clearly has both. With IBM basically out of the game, capability-class supercomputers are coming down to Hewlett Packard Enterprise, Nvidia, Fujitsu (the latter being pretty much focused on RIKEN Lab in Japan and a few other centers that buy chips off the "K" and "Fugaku" blocks), and Atos (which is doing a lot of business with its BullSequana systems in Europe).


What's New In Gartner's Hype Cycle For AI, 2020

#artificialintelligence

Chatbots are projected to see over a 100% increase in their adoption rates in the next two to five years and are the leading AI use cases in enterprises today. Gartner revised the bots' penetration rate from a range of 5% to 20% last year to 20% to 50% this year. Gartner points to chatbot's successful adoption as the face of AI today and the technology's contributions to streamlining automated, touchless customer interactions aimed at keeping customers and employees safe. Bot vendors to watch include Amazon Web Services (AWS), Cognigy, Google, IBM, Microsoft, NTT DOCOMO, Oracle, Rasa and Rulai. GPU Accelerators are the nearest-term technology to mainstream adoption and are predicted to deliver a high level of benefit according to Gartner's' Priority Matrix for AI, 2020.


Nvidia brings Ampere A100 GPUs to Google Cloud

#artificialintelligence

Just over a month after announcing its latest generation Ampere A100 GPU, Nvidia said this week that the powerhouse processor system is now available on Google Cloud. The A100 Accelerator Optimized VM A2 instance family is designed for enormous artificial intelligence workloads and data analytics. Nvidia says users can expect substantive improvements over previous processing models, in this instance up to a 20-fold performance boost. The Nvidia Ampere is the largest 7 nanometer chip ever constructed. It sports 54 billion transistors and offers innovative features such as multi-instance GPU, automatic mixed precision, an NVLink that doubles GPU-to-GPU direct bandwidth and faster memory reaching 1.6 terabytes per second.


Programming In The Parallel Universe

#artificialintelligence

This week is the eighth annual International Workshop on OpenCL, SYCL, Vulkan, and SPIR-V, and the event is available online for the very first time in its history thanks to the coronavirus pandemic. One of the event organizers, and the conference chair, is Simon McIntosh-Smith, who is a professor of high performance computing at Bristol University in Great Britain and also the head of its Microelectronics Group. Among other things, McIntosh-Smith was a microprocessor architect at STMicroeletronics, where he designed SIMD units for the dual-core, superscalar Chameleon and SH5 set-top box ASICs back in the late 1990s. McIntosh-Smith moved to Pixelfusion in 1999, which created the first general purpose GPU – arguably eight or nine years before Nvidia did it with its Tesla GPUs, where he was an architect on the 1,536-core chip and software manager for two years. In 2002, McIntosh-Smith was one of the co-founders of ClearSpeed, which created floating point math accelerators used in HPC systems before GPU accelerators came along, and was first director of architecture and applications and then vice president of applications.


China's Top Server-Builders Adopt NVIDIA AI Design for Cloud Computing

#artificialintelligence

BEIJING, CHINA--(Marketwired - Sep 25, 2017) - GTC China - NVIDIA (NASDAQ: NVDA) today announced that China's leading original equipment manufacturers (OEMs) -- including Huawei, Inspur and Lenovo -- are using the NVIDIA HGX reference architecture to offer Volta architecture-based accelerated systems for hyperscale data centers. Through the NVIDIA HGX Partner Program, NVIDIA is providing each OEM with early access to the NVIDIA HGX reference architecture for data centers, NVIDIA GPU computing technologies, and design guidelines. HGX is the same data center design used in Microsoft's Project Olympus initiative, Facebook's Big Basin systems and NVIDIA DGX-1 AI supercomputers. Using HGX as a starter "recipe," OEM and original design manufacturer (ODM) partners can work with NVIDIA to more quickly design and bring to market a wide range of qualified GPU-accelerated AI systems for hyperscale data centers to meet the industry's growing demand for AI cloud computing. With GPUs based on the NVIDIA Volta architecture offering three times the performance of its predecessor, manufacturers can meet market demand with new products based on the latest NVIDIA technology.


Mindtech Announces Enhanced Scalability Support for Chameleon

#artificialintelligence

Mindtech Global Ltd, a UK based start-up, has announced container support to allow running multiple instances of their Chameleon synthetic data generation tool on clustered platforms. This approach enables full use of available compute resources leading to a significant reduction in training cycle times for neural networks using Synthetic data. Mindtech's Chameleon was built from the outset with the capability to scale for generating large amounts of data to meet the demands of training visual neural networks for AI systems. The simulator incorporates a scripting engine, enabling easy capabilities for repeated simulation runs, changing single variables at a time. The scenario editor is intended to allow the quick and easy creation of multiple different sequences, to create the data required for the intended use case.


HPE Deploys TX-GAIA Supercomputer at MIT Lincoln Laboratory - insideHPC

#artificialintelligence

Today HPE announced announced the deployment of a new supercomputer at the MIT Lincoln Laboratory Supercomputing Center for compute-intensive AI applications and bolstering research across engineering, science, and medicine. Called TX-GAIA (Green AI Accelerator), the new supercomputer converges HPC and AI to support workloads such as modeling and simulation and perform complex deep neural networks (DNN) and other machine learning training. It is based on the HPE Apollo 2000 system, which is purpose-built for HPC and optimized for AI, by integrating the latest Intel Xeon Scalable processors and NVIDIA GPU accelerators. At the MIT Lincoln Laboratory Supercomputing Center, our mission is to solve the nation's hardest technical challenges by advancing computationally intensive science, engineering, and medicine," said Jeremy Kepner, head and founder, at MIT Lincoln Laboratory Supercomputing Center (LLSC). "By collaborating with HPC leaders like HPE, we are expanding technical capabilities to run emerging AI workloads in our supercomputer and accelerate innovation." The new supercomputer has a measured performance of 4.725 Petaflops and will be used to support research projects that will fuel innovation in weather forecasting, medical data analysis, autonomous systems, synthetic DNA design, and new materials and devices. Additionally, the MIT Lincoln Laboratory Supercomputing Center's new system gets an AI performance boost, as measured by the computing speed required to perform DNNs, of a peak performance of 100 AI Petaflops. This will greatly accelerate the processing of deep neural networks and other compute-intensive AI workloads in order to improve training in areas such as image recognition, speech and natural language processing and computer vision. The TX-GAIA system comprises nearly 900 Intel processors and 900 Nvidia GPU accelerators. The new system is housed in a modular data center facility, co-developed with HPE and designed to speed deployment and reduce overall IT resources. It is located in Holyoke, Massachusetts, where it is powered by abundant green energy, and will go into production in the fall of 2019. We've seen strong industry demand for scalable performance to train higher volumes of AI that will advance science and engineering, and make breakthroughs across industries," said Bill Mannel, vice president and general manager, HPC and AI at HPE. "Our continued partnership with MIT Lincoln Laboratory Supercomputing Center extends the power of our HPC technologies to boost AI R&D and create new experiences."


Teasing Out The Bang For The Buck Of Inference Engines

#artificialintelligence

In this case, the benchmarks are for running the GoogLetNet V1 convolutional neural network framework, with a batch size of 1. (Meaning that items to be identified are sent through in serial fashion rather than batched up to be chewed on all at once.) This framework came close to beating humans at image recognition, but it took Microsoft's ResNet in 2015 to accomplish this feat, with a 3.57 percent failure rate compared to humans at 5.1 percent. The baseline for performance that Xilinx chose was the smallest F1 FPGA-accelerated instance on the EC2 compute cloud at Amazon Web Services. This instance has a single Virtex UltraScale VU9P FPGA on it, which has 1.182 million LUTs, which is attached to a server slice that has eight vCPUs (Based on the "Broadwell" Xeon E5-2696 v4 processor and 122 GB of main memory.